Skip to content

Feat add matrix market load#8

Open
VanyaGlazunov wants to merge 3 commits intomainfrom
feat-add-MatrixMarket-load
Open

Feat add matrix market load#8
VanyaGlazunov wants to merge 3 commits intomainfrom
feat-add-MatrixMarket-load

Conversation

@VanyaGlazunov
Copy link
Collaborator

The patch adds support for loading graphs from MatrixMarket format. We assume this structure

vertices.txt — one line per node: <node_name> <1-based-index>
edges.txt — one line per label: <label_name> <1-based-index>
1.txt — MM adjacency matrix for the label with index 1
2.txt — MM adjacency matrix for the label with index 2

Also takes graph from la-n-egg for tests.
Hopefully closes #4

@VanyaGlazunov VanyaGlazunov force-pushed the feat-add-MatrixMarket-load branch from 75f7a1c to 0387a23 Compare March 19, 2026 04:32
@VanyaGlazunov VanyaGlazunov force-pushed the feat-add-MatrixMarket-load branch from 0387a23 to 9fea5d9 Compare March 19, 2026 04:33
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we differ matrix formats and queries formats? Maybe it'll be good refactoring:
formats/queries/... --- for csv and nt
formats/datasets/.. --- for mm

Copy link
Collaborator

@suvorovrain suvorovrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It'll be handy to support some data catalogs about matrices of input dataset. For example --- number of non zero rows or columns. Maybe it should be added in different PR

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new MatrixMarket “directory graph” format to pathrex (vertices/edges mappings + per-label adjacency matrices), wires it into the InMemory backend via GraphSource, and introduces integration tests + testdata based on the la-n-egg dataset.

Changes:

  • Introduce formats::mm (MatrixMarket, mapping parser, MatrixMarket matrix loader via LAGraph_MMRead) and extend FormatError accordingly.
  • Extend InMemoryBuilder/InMemoryGraph to support installing a pre-defined node map and loading pre-built GraphBLAS matrices by label.
  • Add MatrixMarket integration tests and a MatrixMarket test dataset under tests/testdata/mm_graph.

Reviewed changes

Copilot reviewed 13 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/testdata/mm_graph/edges.txt Adds label→index mapping for the MatrixMarket test dataset.
tests/testdata/mm_graph/9.txt Adds one label’s adjacency matrix in MatrixMarket format.
tests/testdata/mm_graph/8.txt Adds one label’s adjacency matrix in MatrixMarket format.
tests/testdata/mm_graph/6.txt Adds one label’s adjacency matrix in MatrixMarket format.
tests/testdata/mm_graph/5.txt Adds one label’s adjacency matrix in MatrixMarket format.
tests/testdata/mm_graph/4.txt Adds one label’s adjacency matrix in MatrixMarket format.
tests/testdata/mm_graph/3.txt Adds one label’s adjacency matrix in MatrixMarket format.
tests/mm_tests.rs Adds integration tests for MatrixMarket loading and mappings.
src/graph/mod.rs Minor formatting change to check_graph signature formatting.
src/graph/inmemory.rs Adds MatrixMarket GraphSource impl; changes node ID storage to support pre-assigned IDs; supports ingesting prebuilt matrices.
src/formats/mod.rs Adds mm module + new FormatError variants for MatrixMarket and invalid mapping formats.
src/formats/mm.rs Implements MatrixMarket directory support (MatrixMarket, load_mm_file, parse_index_map).
Cargo.toml Adds tempfile for unit tests in formats::mm.
AGENTS.md Updates repository documentation to mention MatrixMarket support.
Comments suppressed due to low confidence (1)

src/graph/inmemory.rs:200

  • num_nodes() returns id_to_node.len(), which is incorrect if IDs are sparse/pre-assigned (the graph’s matrix dimension would be max_id + 1). If sparse IDs are supported, store/compute the actual node domain size (e.g., next_id or max_id + 1) and return that instead to keep it consistent with matrix dimensions.
    fn num_nodes(&self) -> usize {
        self.id_to_node.len()
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +83

assert!(graph.get_node_id("<NonexistentNode>").is_none(),);

assert!(graph.get_node_name(999999).is_none(),);
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assert!(...,); calls include a trailing comma with no message tokens, which will cause a compile error for the assert! macro. Remove the trailing commas (or add a message).

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +187

// Try to get a node name for a valid high index
let high_index = num_nodes;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high_index is set to num_nodes, but matrix/node indices are typically in the range 0..num_nodes. If get_node_id() is meant to return a matrix index (per GraphDecomposition docs), this should likely use num_nodes - 1 and the earlier expected IDs should be adjusted accordingly (i.e., convert 1-based input files to 0-based indices internally).

Suggested change
// Try to get a node name for a valid high index
let high_index = num_nodes;
assert!(num_nodes > 0, "Graph should contain at least one node");
// Try to get a node name for a valid high index (largest valid index)
let high_index = num_nodes - 1;

Copilot uses AI. Check for mistakes.
reason: format!("index '{}' is not a valid positive integer", idx_str.trim()),
})?;

let name = name.trim().to_owned();
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_index_map() overwrites entries on duplicate names or indices (HashMap::insert), which can silently corrupt the mapping files. It would be safer to detect duplicates and return FormatError::InvalidFormat identifying the conflicting line.

Suggested change
let name = name.trim().to_owned();
let name = name.trim().to_owned();
if by_idx.contains_key(&idx) {
return Err(FormatError::InvalidFormat {
file: file_name.clone(),
line: line_no + 1,
reason: format!(
"duplicate index '{}' encountered; each index must be unique",
idx
),
});
}
if by_name.contains_key(&name) {
return Err(FormatError::InvalidFormat {
file: file_name.clone(),
line: line_no + 1,
reason: format!(
"duplicate name '{}' encountered; each name must be unique",
name
),
});
}

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +237
[`MatrixMarket`](src/formats/mm.rs:160) loads an edge-labeled graph from a directory with:

- `vertices.txt` — one line per node: `<node_name> <1-based-index>`
- `edges.txt` — one line per label: `<label_name> <1-based-index>`
- `<n>.txt` — MatrixMarket adjacency matrix for label with index `n`

The loader uses [`LAGraph_MMRead`](src/lagraph_sys.rs) to parse each `.txt` file into a
`GrB_Matrix`, then wraps it in an `LAGraph_Graph`. Node IDs from `vertices.txt` are
installed via [`InMemoryBuilder::set_node_map()`](src/graph/inmemory.rs:67).

Helper functions:

- [`load_mm_file(path)`](src/formats/mm.rs:64) — reads a single MatrixMarket file into a
`GrB_Matrix`.
- [`parse_index_map(path)`](src/formats/mm.rs:99) — parses `<name> <index>` mapping files.

`MatrixMarket` implements `GraphSource<InMemoryBuilder>` at
[`src/graph/inmemory.rs:429`](src/graph/inmemory.rs:429).

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGENTS.md references several src/formats/mm.rs and src/graph/inmemory.rs line numbers (e.g., MatrixMarket at src/formats/mm.rs:160, load_mm_file at :64, GraphSource impl at src/graph/inmemory.rs:429) that no longer match the current code. Please update these pointers so the doc remains navigable.

Copilot uses AI. Check for mistakes.
file: file_name.clone(),
line: line_no + 1,
reason: format!("index '{}' is not a valid positive integer", idx_str.trim()),
})?;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_index_map() accepts an index of 0 (since it only parses usize), but the documented format is explicitly 1-based. Consider validating idx >= 1 (and potentially rejecting negative/zero) so malformed mapping files fail fast with FormatError::InvalidFormat instead of producing inconsistent node IDs / filenames.

Suggested change
})?;
})?;
if idx == 0 {
return Err(FormatError::InvalidFormat {
file: file_name.clone(),
line: line_no + 1,
reason: format!(
"index '{}' must be >= 1 (1-based indexing)",
idx_str.trim()
),
});
}

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +65
for label in expected_labels {
let result = graph.get_graph(label);
assert!(result.is_ok(),);
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert!(result.is_ok(),); uses a trailing comma with no message tokens, which does not match assert! macro patterns and will fail to compile. Remove the trailing comma or provide an actual assertion message.

Copilot uses AI. Check for mistakes.
Comment on lines +221 to +222
builder.set_node_map(vert_by_idx, vert_by_name);

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatrixMarket vertices.txt uses 1-based indices, but GraphDecomposition::get_node_id() is documented as returning a matrix index (GraphBLAS is 0-based). As written, set_node_map() installs 1-based IDs directly, which makes get_node_name(0) unmapped and risks off-by-one errors when indexing GraphBLAS matrices. Consider normalizing vertex indices to 0-based here (subtract 1) and validating that the resulting IDs are contiguous.

Suggested change
builder.set_node_map(vert_by_idx, vert_by_name);
// MatrixMarket vertex indices are 1-based, but GraphBLAS indices are 0-based.
// Normalize all vertex IDs to 0-based before installing the node map.
let mut vert_by_idx0 = HashMap::with_capacity(vert_by_idx.len());
for (idx, name) in vert_by_idx {
// Assumes MatrixMarket provides indices starting at 1.
vert_by_idx0.insert(idx - 1, name);
}
let mut vert_by_name0 = HashMap::with_capacity(vert_by_name.len());
for (name, idx) in vert_by_name {
vert_by_name0.insert(name, idx - 1);
}
builder.set_node_map(vert_by_idx0, vert_by_name0);

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +228
if path.exists() {
let matrix = load_mm_file(&path)?;
builder.push_grb_matrix(label, matrix)?;
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MatrixMarket loader silently skips a label if its <idx>.txt file is missing. Given edges.txt declares the available labels, this can hide incomplete datasets and produce partially-loaded graphs without any error. Consider returning an error when an expected matrix file is absent (or at least make this behavior explicit/configurable).

Suggested change
if path.exists() {
let matrix = load_mm_file(&path)?;
builder.push_grb_matrix(label, matrix)?;
}
let matrix = load_mm_file(&path)?;
builder.push_grb_matrix(label, matrix)?;

Copilot uses AI. Check for mistakes.
Copy link
Member

@georgiy-belyanin georgiy-belyanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through and the code seems nice. Check some copilot nitpicks and feel free to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading graphs as a set of Matrix-Market files

4 participants